Extracting Knowledge for Cultural Heritage Knowledge Base Population

نویسنده

  • Naimdjon Takhirov
چکیده

The entity-oriented description of the world is a major, current trend motivated by the need for semantic services that can support the human need of finding information, learning and discovering new knowledge, and broadening the existing knowledge horizons. Entities, managed in semantic knowledge bases, have the potential to be the backbone for these new and innovative services. Therefore, automatically extracting facts from various data sources and populating knowledge bases a challenge studied in this work. This thesis proposes methods for knowledge extraction for the cultural heritage domain. Extracting knowledge from the cultural heritage metadata is by no means a trivial task and there are often problems with missing or ambiguous information. Therefore, an inherent part of this work is dedicated to developing pattern-based techniques to extract knowledge from natural language documents to complement and supplement the knowledge we extract from metadata. However, the proposed framework is not limited to only work in conjunction with metadata extraction – it additionally supports independent, continuous mode operation, i.e. patterns learned during extraction are used to subsequently mine new knowledge. In summary, the main contributions of this thesis are: • FRBR-ML: a generic framework for exploiting metadata which includes: (i) a method to extract entities, attributes and relationships from existing legacy metadata, (ii) novel techniques for correction, enhancement and semantic enrichment of the metadata, and (iii) metrics to assess the quality of extraction. • SPIDER: a prototype that supports extraction of relational facts at Web-scale. Contrary to most knowledge extraction approaches, we tackle the problem of uniquely identifying entities both to extend their list of spelling forms and to facilitate the matching to LOD entities. Furthermore, in addition to the flexible pattern definition scheme, SPIDER enables a provenance-aware extraction method which prudently refines extracted facts by considering the PageRank and SpamScore as well as the relevance score of the source document. • KIEV: a prototype that takes the development of SPIDER into the next stage, namely by enabling verification of facts using two evidence-based techniques:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating Wiki Systems, Natural Language Processing, and Semantic Technologies for Cultural Heritage Data Management

Modern documents can easily be structured and augmented to have the characteristics of a semantic knowledge base. Many older documents may also hold a trove of knowledge that would deserve to be organized as such a knowledge base. In this chapter, we show that modern semantic technologies offer the means to make these heritage documents accessible by transforming them into a semantic knowledge ...

متن کامل

An Ontology-Based Method for Extracting and Classifying Domain-Specific Compositional Nominal Compounds

In this paper, we present our preliminary study on an ontology-based method to extract and classify compositional nominal compounds in specific domains of knowledge. This method is based on the assumption that, applying a conceptual model to represent knowledge domain, it is possible to improve the extraction and classification of lexicon occurrences for that domain in a semi-automatic way. We ...

متن کامل

A Novel Vision for Navigation and Enrichment in Cultural Heritage Collections

In the cultural heritage domain, there is a huge interest in utilizing semantic web technology and build services enabling users to query, explore and access the vast body of cultural heritage information that has been created over decades by memory institutions. For successful conversion of existing data into semantic web data, however, there is often a need to enhance and enrich the legacy da...

متن کامل

Deliverable D 4 . 3 Demonstrator for final web services

SMARTMUSEUM (Cultural Heritage Knowledge Exchange Platform) is a Research and Development project sponsored under the Europeans Commission’s 7th Framework. The overall objective of the project is to develop a platform for innovative services enhancing on-site personalized access to digital cultural heritage through adaptive and privacy preserving user profiling. Using on-site knowledge database...

متن کامل

Cultural learning across the Smart City

Public involvement in defining and interpreting cultural heritage offers many benefits, including improved learning opportunities for individuals and a broader base of knowledge about art and heritage. This knowledge can in turn be used for better, smarter, information provision in the future. This paper proposes how to capture, analyse and present cultural information from different viewpoints...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013